Python for Data Science
Python for IT
General advice
Conclusions
>>> print(self)Jupyter
It's a notebook!
nbconvert or from the UI
In [ ]:
In [ ]:
In [ ]:
It's interactive!
In [30]:
from ipywidgets import interact, fixed
In [34]:
from sympy import init_printing, Symbol, Eq, factor
init_printing(use_latex=True)
x = Symbol('x')
def factorit(n):
return Eq(x**n-1, factor(x**n-1))
In [35]:
interact(factorit, n=(2,40))
In [28]:
# Import matplotlib (plotting), skimage (image processing) and interact (user interfaces)
# This enables their use in the Notebook.
%matplotlib inline
from matplotlib import pyplot as plt
from skimage import data
from skimage.feature import blob_doh
from skimage.color import rgb2gray
# Extract the first 500px square of the Hubble Deep Field.
image = data.hubble_deep_field()[0:500, 0:500]
image_gray = rgb2gray(image)
def plot_blobs(max_sigma=30, threshold=0.1, gray=False):
"""
Plot the image and the blobs that have been found.
"""
blobs = blob_doh(image_gray, max_sigma=max_sigma, threshold=threshold)
fig, ax = plt.subplots(figsize=(8,8))
ax.set_title('Galaxies in the Hubble Deep Field')
if gray:
ax.imshow(image_gray, interpolation='nearest', cmap='gray_r')
circle_color = 'red'
else:
ax.imshow(image, interpolation='nearest')
circle_color = 'yellow'
for blob in blobs:
y, x, r = blob
c = plt.Circle((x, y), r, color=circle_color, linewidth=2, fill=False)
ax.add_patch(c)
In [29]:
interact(plot_blobs, max_sigma=(10, 40, 2), threshold=(0.005, 0.02, 0.001))
It's highly extensible!
In [8]:
import numpy as np
In [9]:
my_list = list(range(0,100000))
res1 = %timeit -o sum(my_list)
In [10]:
array = np.arange(0, 100000)
res2 = %timeit -o np.sum(array)
In [11]:
res1.best / res2.best
Out[11]:
NumPy is much more:
General purpose scientific computing library
scipy.linalg: ATLAS LAPACK and BLAS librariesscipy.stats: distributions, statistical functions...scipy.integrate: integration of functions and ODEsscipy.optimization: local and global optimization, fitting, root finding...scipy.interpolate: interpolation, splines...scipy.fftpack: Fourier trasnformsscipy.signal: Signal processingscipy.special: Special functionsscipy.io: Reading/Writing scientific formats
In [2]:
# This line integrates matplotlib with the notebook
%matplotlib inline
import matplotlib.pyplot as plt
In [3]:
import numpy as np
x = np.linspace(-2, 10)
plt.plot(x, np.sin(x) / x)
Out[3]:
In [4]:
def g(x, y):
return np.cos(x) + np.sin(y) ** 2
x = np.linspace(-2, 3, 1000)
y = np.linspace(-2, 3, 1000)
xx, yy = np.meshgrid(x, y)
zz = g(xx, yy)
fig = plt.figure(figsize=(6, 6))
cs = plt.contourf(xx, yy, zz, np.linspace(-1, 2, 13), cmap=plt.cm.viridis)
plt.colorbar()
cs = plt.contour(xx, yy, zz, np.linspace(-1, 2, 13), colors='k')
plt.clabel(cs)
plt.xlabel("x")
plt.ylabel("y")
plt.title(r"Function $g(x, y) = \cos{x} + \sin^2{y}$")
plt.close()
In [5]:
fig
Out[5]:
There are many alternatives to matplotlib, each one with its use cases, design decisions, and tradeoffs. Here are some of them:
seaborn: High level layer on top of matplotlib, easier API and beautiful defaults for common visualizationsggplot: For those who prefer R-like plotting (API and appearance)plotly: 2D and 3D interactive plots in the browser as a web serviceBokeh: targets modern web browsers and big datapyqtgraph: Qt embedding, realtime plotsOthers: pygal, mpld3, bqplot...
Use the best tool for the job! And in case of doubt, just get matplotlib :)
In [2]:
import numpy as np
import pandas as pd
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
df
Many possibilities!
And by the way, a bit too optimistic:
setuptools is backeasy_install is not gone"Prediction is very difficult, especially about the future."
Per Python ad astra!